{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Neptune SPARQL QA Chain\n",
    "\n",
    "This notebook shows use of LLM to query RDF graph in Amazon Neptune. This code uses a `NeptuneRdfGraph` class that connects with the Neptune database and loads it's schema. The `NeptuneSparqlQAChain` is used to connect the graph and LLM to ask natural language questions.\n",
    "\n",
    "Requirements for running this notebook:\n",
    "- Neptune 1.2.x cluster accessible from this notebook\n",
    "- Kernel with Python 3.9 or higher\n",
    "- For Bedrock access, ensure IAM role has this policy\n",
    "\n",
    "```json\n",
    "{\n",
    "        \"Action\": [\n",
    "            \"bedrock:ListFoundationModels\",\n",
    "            \"bedrock:InvokeModel\"\n",
    "        ],\n",
    "        \"Resource\": \"*\",\n",
    "        \"Effect\": \"Allow\"\n",
    "}\n",
    "```\n",
    "\n",
    "- S3 bucket for staging sample data, bucket should be in same account/region as Neptune."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Seed W3C organizational data\n",
    "W3C org ontology plus some instances. \n",
    "\n",
    "You will need an S3 bucket in the same region and account. Set STAGE_BUCKET to name of that bucket."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "STAGE_BUCKET = \"<bucket-name>\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash  -s \"$STAGE_BUCKET\"\n",
    "\n",
    "rm -rf data\n",
    "mkdir -p data\n",
    "cd data\n",
    "echo getting org ontology and sample org instances\n",
    "wget http://www.w3.org/ns/org.ttl \n",
    "wget https://raw.githubusercontent.com/aws-samples/amazon-neptune-ontology-example-blog/main/data/example_org.ttl \n",
    "\n",
    "echo Copying org ttl to S3\n",
    "aws s3 cp org.ttl s3://$1/org.ttl\n",
    "aws s3 cp example_org.ttl s3://$1/example_org.ttl\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Bulk-load the org ttl - both ontology and instances"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load -s s3://{STAGE_BUCKET} -f turtle --store-to loadres --run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_status {loadres['payload']['loadId']} --errors --details"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup Chain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "EXAMPLES = \"\"\"\n",
    "\n",
    "<question>\n",
    "Find organizations.\n",
    "</question>\n",
    "\n",
    "<sparql>\n",
    "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n",
    "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
    "PREFIX org: <http://www.w3.org/ns/org#> \n",
    "\n",
    "select ?org ?orgName where {{\n",
    "    ?org rdfs:label ?orgName .\n",
    "}} \n",
    "</sparql>\n",
    "\n",
    "<question>\n",
    "Find sites of an organization\n",
    "</question>\n",
    "\n",
    "<sparql>\n",
    "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n",
    "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
    "PREFIX org: <http://www.w3.org/ns/org#> \n",
    "\n",
    "select ?org ?orgName ?siteName where {{\n",
    "    ?org rdfs:label ?orgName .\n",
    "    ?org org:hasSite/rdfs:label ?siteName . \n",
    "}} \n",
    "</sparql>\n",
    "\n",
    "<question>\n",
    "Find suborganizations of an organization\n",
    "</question>\n",
    "\n",
    "<sparql>\n",
    "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n",
    "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
    "PREFIX org: <http://www.w3.org/ns/org#> \n",
    "\n",
    "select ?org ?orgName ?subName where {{\n",
    "    ?org rdfs:label ?orgName .\n",
    "    ?org org:hasSubOrganization/rdfs:label ?subName  .\n",
    "}} \n",
    "</sparql>\n",
    "\n",
    "<question>\n",
    "Find organizational units of an organization\n",
    "</question>\n",
    "\n",
    "<sparql>\n",
    "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n",
    "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
    "PREFIX org: <http://www.w3.org/ns/org#> \n",
    "\n",
    "select ?org ?orgName ?unitName where {{\n",
    "    ?org rdfs:label ?orgName .\n",
    "    ?org org:hasUnit/rdfs:label ?unitName . \n",
    "}} \n",
    "</sparql>\n",
    "\n",
    "<question>\n",
    "Find members of an organization. Also find their manager, or the member they report to.\n",
    "</question>\n",
    "\n",
    "<sparql>\n",
    "PREFIX org: <http://www.w3.org/ns/org#> \n",
    "PREFIX foaf: <http://xmlns.com/foaf/0.1/> \n",
    "\n",
    "select * where {{\n",
    "    ?person rdf:type foaf:Person .\n",
    "    ?person  org:memberOf ?org .\n",
    "    OPTIONAL {{ ?person foaf:firstName ?firstName . }}\n",
    "    OPTIONAL {{ ?person foaf:family_name ?lastName . }}\n",
    "    OPTIONAL {{ ?person  org:reportsTo ??manager }} .\n",
    "}}\n",
    "</sparql>\n",
    "\n",
    "\n",
    "<question>\n",
    "Find change events, such as mergers and acquisitions, of an organization\n",
    "</question>\n",
    "\n",
    "<sparql>\n",
    "PREFIX org: <http://www.w3.org/ns/org#> \n",
    "\n",
    "select ?event ?prop ?obj where {{\n",
    "    ?org rdfs:label ?orgName .\n",
    "    ?event rdf:type org:ChangeEvent .\n",
    "    ?event org:originalOrganization ?origOrg .\n",
    "    ?event org:resultingOrganization ?resultingOrg .\n",
    "}}\n",
    "</sparql>\n",
    "\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "from langchain.chains.graph_qa.neptune_sparql import NeptuneSparqlQAChain\n",
    "from langchain_community.chat_models import BedrockChat\n",
    "from langchain_community.graphs import NeptuneRdfGraph\n",
    "\n",
    "host = \"<neptune-host>\"\n",
    "port = \"<neptune-port>\"\n",
    "region = \"us-east-1\"  # specify region\n",
    "\n",
    "graph = NeptuneRdfGraph(\n",
    "    host=host, port=port, use_iam_auth=True, region_name=region, hide_comments=True\n",
    ")\n",
    "\n",
    "schema_elements = graph.get_schema_elements\n",
    "# Optionally, you can update the schema_elements, and\n",
    "# load the schema from the pruned elements.\n",
    "graph.load_from_schema_elements(schema_elements)\n",
    "\n",
    "bedrock_client = boto3.client(\"bedrock-runtime\")\n",
    "llm = BedrockChat(model_id=\"anthropic.claude-v2\", client=bedrock_client)\n",
    "\n",
    "chain = NeptuneSparqlQAChain.from_llm(\n",
    "    llm=llm,\n",
    "    graph=graph,\n",
    "    examples=EXAMPLES,\n",
    "    verbose=True,\n",
    "    top_K=10,\n",
    "    return_intermediate_steps=True,\n",
    "    return_direct=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ask questions\n",
    "Depends on the data we ingested above"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chain.invoke(\"\"\"How many organizations are in the graph\"\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chain.invoke(\"\"\"Are there any mergers or acquisitions\"\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chain.invoke(\"\"\"Find organizations\"\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chain.invoke(\"\"\"Find sites of MegaSystems or MegaFinancial\"\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chain.invoke(\"\"\"Find a member who is manager of one or more members.\"\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chain.invoke(\"\"\"Find five members and who their manager is.\"\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chain.invoke(\n",
    "    \"\"\"Find org units or suborganizations of The Mega Group. What are the sites of those units?\"\"\"\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
